Dataset description
There are two submissions: 10267 & 10270.
- In each submission, 2390 families with .vcf files are included.
- For each family, two vcf files are provided,
- one named “sorted”.
- the other named “annotated”.
Note that combing 10267 & 10270, there are 2207 families with complete vcf files.
Submission 10267
FreeBayes is used for annotation.
- For files named “sorted”,
- 852 families without GL/PL information
- For files named “annotated”,
- 984 families without GL/PL information
Note that for FID:13562, there is no father information.
Submission 10270
FamSeq is used for annotation.
- For files named “sorted”, there is no GL/PL information.
- For files names “annotated”,
- vcf files of 582 families are empty
- vcf files of 12 families with variants < 1000.
Call de novo mutations
Triodenovo was used to call de novo mutations:
- only SNP with GL/PL information was retained.
- only trios were used (Note the family with two or more probands).
- filters: --minDP 7 --minDepth 10 and other default options
- Post filter:
- Basic filtering for SNVs. The following filter will retain sites of single nucleotides with only two alleles, QUAL>=30, and mutations in which parents are homozygous references and child is heterozygote with the heterozygote PL being zero, and the minimum PL of the other two genotypes in offering is 30 (i.e. the genotype likelihood, defined as P(R|G) in which R represents the aligned bases and G is the underlying genotype, of the called het mutation is >1000 than the genotype likelihood of the other two genotypes)
The scripts are stored in /30days/uqywan67/SSC/scripts/call_deno.R
Annotation
ANNOVAR was used to annotate de novo mutations.
- hg19refGene, exac, gnomad databases were used.
- Further filtered exonic variants only.
- Cross-reference with public SSC DNMs for now (!!!Specific filters warrants further explorations).
Burden test analysis
De novo mutation counts
Note that the DNMs (only exonic variants were retained) called by Triodenovo were further cross-referenced with public DNMs in SSC.
All families (1686 families)
A total of 1686 families were found to have DNM events.
- 1328 Probands, with ~1.18 nonsynonymous and ~0.43 synonymous mutations on average per individual, respectively
- 944 Siblings, with ~1.12 nonsynonymous and ~0.44 synonymous mutations on average per individual, respectively

Quads families (1352 families)
A total of 1352 families were found to have DNM events.
- 994 Probands, with ~1.17 nonsynonymous and ~0.43 synonymous mutations on average per individual, respectively
- 944 Siblings, with ~1.13 nonsynonymous and ~0.44 synonymous mutations on average per individual, respectively

Probands (1292 families)
A total of 1292 probands with RRB cluster assigned were found to have DNM events.
- Cluster 1: N = 361
- Cluster 2: N = 602
- Cluster 3: N = 329

RRB phenotypes
Model: Phenotype (pre-adjusted) ~ DNM counts, was used in each cluster.
- The results are consistent using phenotypes either pre-adjusted by IQ or not.
- The significant association pre-adjusted by IQ was summarized below:
Note that none of those associations remain to be significant after Bonferroni correction (p-value < 0.05/25=0.002)